Text Chunking using Transformation-Based Learning

نویسندگان

  • Lance A. Ramshaw
  • Mitch Marcus
چکیده

Eric Brill introduced transformation-based learning and showed that it can do part-ofspeech tagging with fairly high accuracy. The same method can be applied at a higher level of textual interpretation for locating chunks in the tagged text, including non-recursive “baseNP” chunks. For this purpose, it is convenient to view chunking as a tagging problem by encoding the chunk structure in new tags attached to each word. In automatic tests using Treebank-derived data, this technique achieved recall and precision rates of roughly 92% for baseNP chunks and 88% for somewhat more complex chunks that partition the sentence. Some interesting adaptations to the transformation-based learning approach are also suggested by this application.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrase Chunking Using Entropy Guided Transformation Learning

Entropy Guided Transformation Learning (ETL) is a new machine learning strategy that combines the advantages of decision trees (DT) and Transformation Based Learning (TBL). In this work, we apply the ETL framework to four phrase chunking tasks: Portuguese noun phrase chunking, English base noun phrase chunking, English text chunking and Hindi text chunking. In all four tasks, ETL shows better r...

متن کامل

تعیین مرز و نوع عبارات نحوی در متون فارسی

Text tokenization is the process of tokenizing text to meaningful tokens such as words, phrases, sentences, etc. Tokenization of syntactical phrases named as chunking is an important preprocessing needed in many applications such as machine translation information retrieval, text to speech, etc. In this paper chunking of Farsi texts is done using statistical and learning methods and the grammat...

متن کامل

Entropy Guided Transformation Learning

This work presents Entropy Guided Transformation Learning (ETL), a new machine learning algorithm for classification tasks. It generalizes Transformation Based Learning (TBL) by automatically solving the TBL bottleneck: the construction of good template sets. We also present ETL Committee, an ensemble method that uses ETL as the base learner. The main advantage of ETL is its easy applicability ...

متن کامل

Fast Boosting-based Part-of-Speech Tagging and Text Chunking with Efficient Rule Representation for Sequential Labeling

This paper proposes two techniques for fast sequential labeling such as part-of-speech (POS) tagging and text chunking. The first technique is a boosting-based algorithm that learns rules represented by combination of features. To avoid time-consuming evaluation of combination, we divide features into not used ones and used ones for learning combination. The other is a rule representation. Usua...

متن کامل

A Fast Boosting-based Learner for Feature-Rich Tagging and Chunking

Combination of features contributes to a significant improvement in accuracy on tasks such as part-of-speech (POS) tagging and text chunking, compared with using atomic features. However, selecting combination of features on learning with large-scale and feature-rich training data requires long training time. We propose a fast boosting-based algorithm for learning rules represented by combinati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره cmp-lg/9505040  شماره 

صفحات  -

تاریخ انتشار 1995